fix: raise AmbiguousReference error for duplicate column names in subquery by xiedeyantu · Pull Request #21236 · apache/datafusion

xiedeyantu · 2026-03-29T13:49:13Z

Which issue does this PR close?

Closes Unambiguous Column Reference Error Not Triggered in SQL Query #21232.

Rationale for this change

When two joined tables share a column with the same name (e.g. age), a SELECT * inside a derived table subquery produces duplicate column names. Previously, referencing such a column by its unqualified name from the outer query silently succeeded instead of raising an ambiguity error, violating standard SQL semantics.

What changes are included in this PR?

Added an ambiguous_names: HashSet<String> field to DFSchema to track column names that are structurally ambiguous in a given schema context.
Added DFSchema::with_ambiguous_names (builder) and DFSchema::ambiguous_names (accessor) methods.
In SubqueryAlias::try_new, after unique_field_aliases renames duplicate columns to keep the Arrow schema valid, the original (pre-rename) names are collected into ambiguous_names and attached to the output schema.
In DFSchema::qualified_field_with_unqualified_name, any lookup of an ambiguous name now immediately returns SchemaError::AmbiguousReference.
In Column::normalize_with_schemas_and_ambiguity_check, even a single structural match is rejected when the containing schema has flagged the name as ambiguous.
Updated the bad_extension_planner snapshot test to include the new ambiguous_names field in the DFSchema debug output.

Are these changes tested?

The existing join_with_ambiguous_column, order_by_ambiguous_name, and group_by_ambiguous_name tests continue to pass. A new test case covering the reported scenario (select age from (SELECT * FROM a join b on a.aid = b.bid) as t) should be added to datafusion/sql/tests/sql_integration.rs.

Are there any user-facing changes?

Yes. Queries that previously silently resolved an ambiguous column reference through a derived-table subquery will now receive a Schema error: Ambiguous reference to unqualified field <name> error, consistent with standard SQL behavior and with how DataFusion already handles the same ambiguity at the direct JOIN level.

xudong963

Could you please add some slt tests? multiple joins (2, 3 etc) with duplicated columns

xudong963 · 2026-03-30T08:01:19Z

datafusion/expr/src/logical_plan/plan.rs

        let aliases = unique_field_aliases(plan.schema().fields());
        let is_projection_needed = aliases.iter().any(Option::is_some);

+        // Collect the set of unqualified field names that are ambiguous in this


Here marks a name as ambiguous when unique_field_aliases provides a rename for it. But unique_field_aliases renames all duplicates — so if there are 3 columns named age, the 2nd and 3rd get renamed.

The code collects field.name() for each renamed field, which means it collects "age" twice and puts it in the set. That works correctly due to HashSet dedup, but the first age (which was NOT renamed) is also ambiguous and only ends up in the set because one of the later duplicates shares its name. This is coincidentally correct but fragile — if unique_field_aliases ever changed to rename ALL duplicates (including the first), or if it renamed to something other than name:N, the logic could break. 🤔

A cleaner approach: count occurrences of each name and mark any name appearing 2+ times.

@xudong963 Thank you very much for your review and excellent suggestions. My initial intention for this PR was to eliminate the unique naming convention—specifically, the name:N format—but it appears to play a critical role internally (it would be great if you could provide some further details on this). Consequently, I introduced an additional ambiguous_names field to track duplicate column names. To be honest, I felt this approach lacked elegance, but I couldn't come up with a better alternative at the time. Having reviewed your suggestions, however, I now believe they offer a superior solution; I will proceed to refactor this PR based on that approach. I will also add the corresponding SLT tests.

xiedeyantu · 2026-03-30T14:33:51Z

@xudong963 I have made the changes based on your suggestions and added tests. There is one failing item in the CI, but it appears unrelated to the current PR, as the main branch shows the same failure. Could you please review this again? Thanks!

xudong963

Looks like nested derived tables still lose the ambiguity marker after one extra SELECT * wrapper: such as SELECT age FROM ( SELECT * FROM ( SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id ) AS s1 ) AS s2;

And for join, how about the case SELECT age FROM (SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id) sub JOIN seed ON true

xiedeyantu · 2026-04-01T11:57:51Z

Looks like nested derived tables still lose the ambiguity marker after one extra SELECT * wrapper: such as SELECT age FROM ( SELECT * FROM ( SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id ) AS s1 ) AS s2;

And for join, how about the case SELECT age FROM (SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id) sub JOIN seed ON true

@xudong963 Thanks! Good catch, I have fixed this issue and added tests for nested derived tables and subquery JOIN cases.

xudong963

ORDER BY still drops the ambiguity marker when it falls back from the select-list schema to the underlying FROM schema. order_by_to_sort_expr clones the projected schema and then calls DFSchema::merge, but merge still never unions other_schema.ambiguous_names

For example:

SELECT score
FROM (SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id) sub
ORDER BY age;

xiedeyantu · 2026-04-03T08:09:19Z

ORDER BY still drops the ambiguity marker when it falls back from the select-list schema to the underlying FROM schema. order_by_to_sort_expr clones the projected schema and then calls DFSchema::merge, but merge still never unions other_schema.ambiguous_names

For example:
SELECT score
FROM (SELECT * FROM t_left JOIN t_right ON t_left.id = t_right.id) sub
ORDER BY age;

@xudong963 Thanks! I have fixed this issue via commit 5ab14b2.

xudong963

Good for me, thanks

alamb · 2026-04-04T10:03:17Z

run benchmark sql_planner

alamb

thanks @xiedeyantu and @xudong963 - I am worried about the implications of adding a new field to DFSchema for what appears to be a fairly narrow usecase (as in the cost will be high but the benefit relatively small)

Is there some way to improve the messages without having to pre-compute this field? For example, perhaps compute the ambiguous names (only) when producing the error message

alamb · 2026-04-04T10:02:54Z

datafusion/common/src/dfschema.rs

+    /// source (e.g. a derived-table subquery) contained multiple columns with
+    /// the same unqualified name.  Any attempt to reference these names without
+    /// a qualifier should produce an [`SchemaError::AmbiguousReference`] error.
+    ambiguous_names: HashSet<String>,


DFSchema is used many places in this codebase -- I am worried that adding a new HashSet will cause non trivial slowdown in planning. I will run some benchmarks on this one

this is a good point!

adriangbot · 2026-04-04T10:06:48Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4186874235-784-nhqkf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing ambiguous-names (5ab14b2) to bc2b36c (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

xiedeyantu · 2026-04-04T10:37:31Z

thanks @xiedeyantu and @xudong963 - I am worried about the implications of adding a new field to DFSchema for what appears to be a fairly narrow usecase (as in the cost will be high but the benefit relatively small)

Is there some way to improve the messages without having to pre-compute this field? For example, perhaps compute the ambiguous names (only) when producing the error message

@alamb I haven't thought of a better approach for now. If this solution isn’t ideal, you can close this PR first, and I’ll keep thinking about whether there might be a better alternative.

adriangbot · 2026-04-04T11:01:32Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                 ambiguous-names                        main
-----                                                 ---------------                        ----
logical_aggregate_with_join                           1.05    456.7±1.72µs        ? ?/sec    1.00    433.6±1.08µs        ? ?/sec
logical_plan_struct_join_agg_sort                     1.09    183.0±0.96µs        ? ?/sec    1.00    167.3±0.67µs        ? ?/sec
logical_select_all_from_1000                          1.00      8.2±0.02ms        ? ?/sec    1.00      8.2±0.04ms        ? ?/sec
logical_select_one_from_700                           1.03    327.1±0.72µs        ? ?/sec    1.00    319.1±1.84µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.06    280.5±0.48µs        ? ?/sec    1.00    265.4±0.64µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.06    268.0±0.64µs        ? ?/sec    1.00    253.6±0.81µs        ? ?/sec
physical_intersection                                 1.03    601.8±1.97µs        ? ?/sec    1.00    583.4±3.59µs        ? ?/sec
physical_join_consider_sort                           1.06   1072.4±3.09µs        ? ?/sec    1.00   1012.1±3.16µs        ? ?/sec
physical_join_distinct                                1.06    262.1±0.52µs        ? ?/sec    1.00    246.9±0.86µs        ? ?/sec
physical_many_self_joins                              1.11      8.3±0.02ms        ? ?/sec    1.00      7.5±0.02ms        ? ?/sec
physical_plan_clickbench_all                          1.00    109.6±0.31ms        ? ?/sec    1.02    111.4±0.43ms        ? ?/sec
physical_plan_clickbench_q1                           1.02   1126.5±5.03µs        ? ?/sec    1.00   1104.9±3.85µs        ? ?/sec
physical_plan_clickbench_q10                          1.04  1838.7±24.96µs        ? ?/sec    1.00  1775.3±20.28µs        ? ?/sec
physical_plan_clickbench_q11                          1.03   1935.3±6.65µs        ? ?/sec    1.00   1888.0±5.65µs        ? ?/sec
physical_plan_clickbench_q12                          1.03      2.0±0.01ms        ? ?/sec    1.00   1962.1±4.77µs        ? ?/sec
physical_plan_clickbench_q13                          1.03   1777.7±5.90µs        ? ?/sec    1.00   1721.8±5.34µs        ? ?/sec
physical_plan_clickbench_q14                          1.03  1935.9±12.22µs        ? ?/sec    1.00   1877.3±4.90µs        ? ?/sec
physical_plan_clickbench_q15                          1.01   1805.7±9.72µs        ? ?/sec    1.00   1789.0±4.44µs        ? ?/sec
physical_plan_clickbench_q16                          1.01  1505.9±10.73µs        ? ?/sec    1.00   1493.4±4.83µs        ? ?/sec
physical_plan_clickbench_q17                          1.02   1579.7±6.90µs        ? ?/sec    1.00   1551.4±5.14µs        ? ?/sec
physical_plan_clickbench_q18                          1.00   1393.3±4.00µs        ? ?/sec    1.00  1398.8±11.83µs        ? ?/sec
physical_plan_clickbench_q19                          1.01   1771.6±5.02µs        ? ?/sec    1.00   1761.6±5.76µs        ? ?/sec
physical_plan_clickbench_q2                           1.00   1500.2±3.59µs        ? ?/sec    1.00   1499.8±6.13µs        ? ?/sec
physical_plan_clickbench_q20                          1.02   1232.3±3.15µs        ? ?/sec    1.00   1209.7±3.72µs        ? ?/sec
physical_plan_clickbench_q21                          1.03   1507.9±7.72µs        ? ?/sec    1.00   1463.6±4.17µs        ? ?/sec
physical_plan_clickbench_q22                          1.01   1905.0±5.59µs        ? ?/sec    1.00   1889.9±5.11µs        ? ?/sec
physical_plan_clickbench_q23                          1.01      2.1±0.00ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q24                          1.01      3.1±0.01ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
physical_plan_clickbench_q25                          1.01   1601.8±3.91µs        ? ?/sec    1.00   1586.2±4.66µs        ? ?/sec
physical_plan_clickbench_q26                          1.02   1439.9±3.72µs        ? ?/sec    1.00   1417.2±4.57µs        ? ?/sec
physical_plan_clickbench_q27                          1.00   1588.1±3.88µs        ? ?/sec    1.00   1582.2±3.96µs        ? ?/sec
physical_plan_clickbench_q28                          1.00      2.0±0.01ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      2.2±0.01ms        ? ?/sec    1.01      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q3                           1.00   1418.8±4.44µs        ? ?/sec    1.00   1421.4±6.56µs        ? ?/sec
physical_plan_clickbench_q30                          1.00     16.3±0.04ms        ? ?/sec    1.00     16.3±0.06ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      2.2±0.00ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q32                          1.01      2.2±0.01ms        ? ?/sec    1.00      2.1±0.00ms        ? ?/sec
physical_plan_clickbench_q33                          1.01  1787.7±18.99µs        ? ?/sec    1.00   1770.1±6.40µs        ? ?/sec
physical_plan_clickbench_q34                          1.00   1513.8±4.16µs        ? ?/sec    1.00   1509.7±4.08µs        ? ?/sec
physical_plan_clickbench_q35                          1.01   1577.3±4.29µs        ? ?/sec    1.00   1560.4±5.24µs        ? ?/sec
physical_plan_clickbench_q36                          1.05   1969.6±5.41µs        ? ?/sec    1.00   1881.8±6.01µs        ? ?/sec
physical_plan_clickbench_q37                          1.04      2.3±0.01ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
physical_plan_clickbench_q38                          1.04      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q39                          1.04      2.2±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q4                           1.00   1224.6±4.14µs        ? ?/sec    1.00   1225.6±5.20µs        ? ?/sec
physical_plan_clickbench_q40                          1.03      2.7±0.01ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_clickbench_q41                          1.01      2.4±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q42                          1.02      2.3±0.00ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
physical_plan_clickbench_q43                          1.03      2.5±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q44                          1.05   1338.0±7.69µs        ? ?/sec    1.00   1276.8±4.70µs        ? ?/sec
physical_plan_clickbench_q45                          1.05  1360.0±11.55µs        ? ?/sec    1.00   1299.1±5.16µs        ? ?/sec
physical_plan_clickbench_q46                          1.00   1640.6±4.00µs        ? ?/sec    1.01  1652.4±10.46µs        ? ?/sec
physical_plan_clickbench_q47                          1.00      2.3±0.01ms        ? ?/sec    1.01      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q48                          1.01      2.5±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q49                          1.01      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q5                           1.01   1364.9±8.32µs        ? ?/sec    1.00  1350.9±15.52µs        ? ?/sec
physical_plan_clickbench_q50                          1.00      2.5±0.01ms        ? ?/sec    1.01      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q51                          1.00   1734.4±4.32µs        ? ?/sec    1.01  1759.7±16.19µs        ? ?/sec
physical_plan_clickbench_q6                           1.03   1370.4±5.04µs        ? ?/sec    1.00   1330.6±8.37µs        ? ?/sec
physical_plan_clickbench_q7                           1.00   1141.7±3.83µs        ? ?/sec    1.00  1141.2±11.55µs        ? ?/sec
physical_plan_clickbench_q8                           1.03   1647.5±7.07µs        ? ?/sec    1.00   1601.6±8.27µs        ? ?/sec
physical_plan_clickbench_q9                           1.03   1686.0±6.37µs        ? ?/sec    1.00   1640.3±4.13µs        ? ?/sec
physical_plan_struct_join_agg_sort                    1.03   1383.0±9.10µs        ? ?/sec    1.00   1340.8±3.41µs        ? ?/sec
physical_plan_tpcds_all                               1.01    768.5±4.37ms        ? ?/sec    1.00    762.5±5.74ms        ? ?/sec
physical_plan_tpch_all                                1.02     48.3±0.08ms        ? ?/sec    1.00     47.5±0.13ms        ? ?/sec
physical_plan_tpch_q1                                 1.01   1553.8±4.71µs        ? ?/sec    1.00   1538.6±2.71µs        ? ?/sec
physical_plan_tpch_q10                                1.02      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_tpch_q11                                1.03      2.6±0.01ms        ? ?/sec    1.00      2.5±0.00ms        ? ?/sec
physical_plan_tpch_q12                                1.02   1306.6±7.06µs        ? ?/sec    1.00   1282.1±2.63µs        ? ?/sec
physical_plan_tpch_q13                                1.00   1000.6±2.18µs        ? ?/sec    1.00    996.0±3.09µs        ? ?/sec
physical_plan_tpch_q14                                1.04   1372.6±4.63µs        ? ?/sec    1.00   1314.6±2.84µs        ? ?/sec
physical_plan_tpch_q16                                1.02   1692.6±7.58µs        ? ?/sec    1.00   1662.3±2.97µs        ? ?/sec
physical_plan_tpch_q17                                1.03   1862.7±5.21µs        ? ?/sec    1.00   1814.5±5.23µs        ? ?/sec
physical_plan_tpch_q18                                1.02   1971.2±7.98µs        ? ?/sec    1.00   1934.4±6.59µs        ? ?/sec
physical_plan_tpch_q19                                1.02      2.5±0.00ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q2                                 1.01      4.5±0.01ms        ? ?/sec    1.00      4.5±0.01ms        ? ?/sec
physical_plan_tpch_q20                                1.05      2.4±0.00ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
physical_plan_tpch_q21                                1.03      3.1±0.00ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
physical_plan_tpch_q22                                1.01      2.1±0.00ms        ? ?/sec    1.00      2.1±0.00ms        ? ?/sec
physical_plan_tpch_q3                                 1.02   1964.4±6.61µs        ? ?/sec    1.00   1925.7±2.57µs        ? ?/sec
physical_plan_tpch_q4                                 1.01   1044.4±2.19µs        ? ?/sec    1.00   1030.0±2.53µs        ? ?/sec
physical_plan_tpch_q5                                 1.02      2.5±0.00ms        ? ?/sec    1.00      2.5±0.00ms        ? ?/sec
physical_plan_tpch_q6                                 1.04    658.3±1.53µs        ? ?/sec    1.00    634.3±1.92µs        ? ?/sec
physical_plan_tpch_q7                                 1.02      3.1±0.00ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q8                                 1.02      4.2±0.01ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q9                                 1.02      3.1±0.00ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     14.7±0.04ms        ? ?/sec    1.00     14.7±0.03ms        ? ?/sec
physical_select_all_from_1000                         1.01     18.1±0.08ms        ? ?/sec    1.00     18.0±0.07ms        ? ?/sec
physical_select_one_from_700                          1.04    807.7±2.77µs        ? ?/sec    1.00    773.7±2.46µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.01      4.8±0.01ms        ? ?/sec    1.00      4.8±0.01ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.01     11.8±0.06ms        ? ?/sec    1.00     11.7±0.04ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    118.1±0.79ms        ? ?/sec    1.00    118.7±3.30ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    617.0±3.54ms        ? ?/sec    1.01    625.2±8.12ms        ? ?/sec
physical_theta_join_consider_sort                     1.06   1108.3±3.41µs        ? ?/sec    1.00   1046.9±2.55µs        ? ?/sec
physical_unnest_to_join                               1.02   1247.1±2.26µs        ? ?/sec    1.00   1222.5±5.30µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00    752.1±2.13µs        ? ?/sec    1.00    749.9±1.97µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00   1503.9±2.98µs        ? ?/sec    1.00   1504.0±9.16µs        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    446.4±1.19µs        ? ?/sec    1.00    447.1±1.32µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    558.3±1.64µs        ? ?/sec    1.00    557.0±2.67µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    600.7±1.75µs        ? ?/sec    1.00    600.4±1.87µs        ? ?/sec
with_param_values_many_columns                        1.00    465.1±1.99µs        ? ?/sec    1.00    467.3±2.46µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	1260.9s
Peak memory	18.5 GiB
Avg memory	18.5 GiB
CPU user	1506.6s
CPU sys	1.8s
Disk read	0 B
Disk write	568.1 MiB

branch

Metric	Value
Wall time	1257.2s
Peak memory	18.5 GiB
Avg memory	18.5 GiB
CPU user	1502.5s
CPU sys	1.3s
Disk read	0 B
Disk write	23.5 MiB

File an issue against this benchmark runner

xiedeyantu · 2026-04-04T11:41:21Z

@alamb @xudong963 I see that the test results are now available. I'm not entirely sure how to interpret them, but based on a rough comparison with the baseline branch, it appears there is some impact. I'm not certain whether this discrepancy is acceptable. If you feel it is not, we can temporarily close this PR while I look for alternative solutions.

alamb · 2026-04-07T20:35:21Z

@alamb @xudong963 I see that the test results are now available. I'm not entirely sure how to interpret them, but based on a rough comparison with the baseline branch, it appears there is some impact. I'm not certain whether this discrepancy is acceptable. If you feel it is not, we can temporarily close this PR while I look for alternative solutions.

Yes I agree -- my interpretation of the benchmark results show that this PR causes a small (1-3%) slowdown in SQL planning. I don't think it is acceptable to slow down all query planning to improve an error message. (It is fine to slow down planning when the error was going to be generated, but we shouldn't take all good queries too)

Thank you very much for looking into this

alamb

I think we need to avoid performance regressions before merging this

xiedeyantu · 2026-04-07T22:28:47Z

I think we need to avoid performance regressions before merging this

Of course! I’ll think about other possible ways to fix this issue. Thanks for your expert feedback.

…query

xiedeyantu · 2026-04-08T14:07:27Z

run benchmark sql_planner

adriangbot · 2026-04-08T14:07:28Z

Hi @xiedeyantu, thanks for the request (#21236 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas.

File an issue against this benchmark runner

xiedeyantu · 2026-04-08T14:29:41Z

@alamb @xudong963 Could you help me rerun the benchmark?

alamb · 2026-04-08T18:39:41Z

run benchmark sql_planner

adriangbot · 2026-04-08T18:44:22Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4208654865-964-7jpl8 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing ambiguous-names (4e8a1a5) to 5ba06ac (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-08T19:16:32Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

Benchmarking physical_plan_clickbench_q34: Analyzing
Benchmarking physical_plan_clickbench_q35
Benchmarking physical_plan_clickbench_q35: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.3s, enable flat sampling, or reduce sample count to 50.
Benchmarking physical_plan_clickbench_q35: Collecting 100 samples in estimated 7.3013 s (5050 iterations)
Benchmarking physical_plan_clickbench_q35: Analyzing
Benchmarking physical_plan_clickbench_q36
Benchmarking physical_plan_clickbench_q36: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50.
Benchmarking physical_plan_clickbench_q36: Collecting 100 samples in estimated 8.9222 s (5050 iterations)
Benchmarking physical_plan_clickbench_q36: Analyzing
Benchmarking physical_plan_clickbench_q37
Benchmarking physical_plan_clickbench_q37: Warming up for 3.0000 s

thread 'main' (39590) panicked at datafusion/core/benches/sql_planner.rs:57:14:
called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 'simplify_expressions' failed", ArrowError(CastError("Cannot cast string '2013-07-01' to value of UInt16 type"), Some("")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `-p datafusion --bench sql_planner`

File an issue against this benchmark runner

xiedeyantu · 2026-04-08T23:10:53Z

@alamb This seems to be related to commit fb85e35df. I've applied a temporary fix; let's see if it passes the benchmarks.

xudong963 · 2026-04-09T06:59:50Z

run benchmark sql_planner

adriangbot · 2026-04-09T07:02:11Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4212119392-980-gb5ph 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing ambiguous-names (0c8b96d) to 5ba06ac (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-09T07:28:18Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

Benchmarking physical_plan_clickbench_q34: Analyzing
Benchmarking physical_plan_clickbench_q35
Benchmarking physical_plan_clickbench_q35: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.4s, enable flat sampling, or reduce sample count to 50.
Benchmarking physical_plan_clickbench_q35: Collecting 100 samples in estimated 7.4015 s (5050 iterations)
Benchmarking physical_plan_clickbench_q35: Analyzing
Benchmarking physical_plan_clickbench_q36
Benchmarking physical_plan_clickbench_q36: Warming up for 3.0000 s

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
Benchmarking physical_plan_clickbench_q36: Collecting 100 samples in estimated 9.0547 s (5050 iterations)
Benchmarking physical_plan_clickbench_q36: Analyzing
Benchmarking physical_plan_clickbench_q37
Benchmarking physical_plan_clickbench_q37: Warming up for 3.0000 s

thread 'main' (37232) panicked at datafusion/core/benches/sql_planner.rs:57:14:
called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 'simplify_expressions' failed", ArrowError(CastError("Cannot cast string '2013-07-01' to value of UInt16 type"), Some("")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `-p datafusion --bench sql_planner`

File an issue against this benchmark runner

alamb · 2026-04-09T11:36:20Z

Benchmarks appear to be failing due to a panic 🤔

called Result::unwrap() on an Err value: Context("Optimizer rule 'simplify_expressions' failed", ArrowError(CastError("Cannot cast string '2013-07-01' to value of UInt16 type"), Some("")))

I wonder if this is a problem on main too ?

xiedeyantu · 2026-04-09T11:55:00Z

Benchmarks appear to be failing due to a panic 🤔

called Result::unwrap() on an Err value: Context("Optimizer rule 'simplify_expressions' failed", ArrowError(CastError("Cannot cast string '2013-07-01' to value of UInt16 type"), Some("")))

I wonder if this is a problem on main too ?

@alamb I'm not entirely sure myself. I've created a new RP #21498 with only minor modifications—would it be possible to run a benchmark to test it out?

github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate common Related to common crate labels Mar 29, 2026

xudong963 changed the title ~~fix: raise AmbiguousReference error for duplicate column names in sub…~~ fix: raise AmbiguousReference error for duplicate column names in subquery Mar 30, 2026

xudong963 reviewed Mar 30, 2026

View reviewed changes

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 30, 2026

xiedeyantu requested a review from xudong963 March 30, 2026 23:04

xudong963 reviewed Apr 1, 2026

View reviewed changes

xudong963 reviewed Apr 3, 2026

View reviewed changes

github-actions bot added the sql SQL Planner label Apr 3, 2026

xudong963 approved these changes Apr 3, 2026

View reviewed changes

alamb reviewed Apr 4, 2026

View reviewed changes

alamb requested changes Apr 7, 2026

View reviewed changes

fix: raise AmbiguousReference error for duplicate column names in sub…

76150b2

…query

xiedeyantu force-pushed the ambiguous-names branch from 5ab14b2 to 76150b2 Compare April 8, 2026 14:06

xiedeyantu added 2 commits April 8, 2026 22:10

fix test

e3e3c55

fix fmt

4e8a1a5

fix panic

0c8b96d

xiedeyantu mentioned this pull request Apr 9, 2026

fix: DataFusion benchmark panicked: failed to cast '2013-07-01' to UInt16 #21498

Open

Conversation

xiedeyantu commented Mar 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

xudong963 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Mar 30, 2026

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Apr 1, 2026

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Apr 3, 2026

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 4, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

adriangbot commented Apr 4, 2026

Uh oh!

xiedeyantu commented Apr 4, 2026

Uh oh!

adriangbot commented Apr 4, 2026

Uh oh!

xiedeyantu commented Apr 4, 2026

Uh oh!

alamb commented Apr 7, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Apr 7, 2026

Uh oh!

xiedeyantu commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

xiedeyantu commented Apr 8, 2026

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

xiedeyantu commented Apr 8, 2026

Uh oh!

xudong963 commented Apr 9, 2026

Uh oh!

adriangbot commented Apr 9, 2026

Uh oh!

adriangbot commented Apr 9, 2026

Uh oh!

alamb commented Apr 9, 2026

Uh oh!